5 research outputs found
Composable Deep Reinforcement Learning for Robotic Manipulation
Model-free deep reinforcement learning has been shown to exhibit good
performance in domains ranging from video games to simulated robotic
manipulation and locomotion. However, model-free methods are known to perform
poorly when the interaction time with the environment is limited, as is the
case for most real-world robotic tasks. In this paper, we study how maximum
entropy policies trained using soft Q-learning can be applied to real-world
robotic manipulation. The application of this method to real-world manipulation
is facilitated by two important features of soft Q-learning. First, soft
Q-learning can learn multimodal exploration strategies by learning policies
represented by expressive energy-based models. Second, we show that policies
learned with soft Q-learning can be composed to create new policies, and that
the optimality of the resulting policy can be bounded in terms of the
divergence between the composed policies. This compositionality provides an
especially valuable tool for real-world manipulation, where constructing new
policies by composing existing skills can provide a large gain in efficiency
over training from scratch. Our experimental evaluation demonstrates that soft
Q-learning is substantially more sample efficient than prior model-free deep
reinforcement learning methods, and that compositionality can be performed for
both simulated and real-world tasks.Comment: Videos: https://sites.google.com/view/composing-real-world-policies
MotionLM: Multi-Agent Motion Forecasting as Language Modeling
Reliable forecasting of the future behavior of road agents is a critical
component to safe planning in autonomous vehicles. Here, we represent
continuous trajectories as sequences of discrete motion tokens and cast
multi-agent motion prediction as a language modeling task over this domain. Our
model, MotionLM, provides several advantages: First, it does not require
anchors or explicit latent variable optimization to learn multimodal
distributions. Instead, we leverage a single standard language modeling
objective, maximizing the average log probability over sequence tokens. Second,
our approach bypasses post-hoc interaction heuristics where individual agent
trajectory generation is conducted prior to interactive scoring. Instead,
MotionLM produces joint distributions over interactive agent futures in a
single autoregressive decoding process. In addition, the model's sequential
factorization enables temporally causal conditional rollouts. The proposed
approach establishes new state-of-the-art performance for multi-agent motion
prediction on the Waymo Open Motion Dataset, ranking 1st on the interactive
challenge leaderboard.Comment: To appear at the International Conference on Computer Vision (ICCV)
202